Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM - MAP decoding and evaluation
نویسندگان
چکیده
The Hidden Dynamic Model (HDM) has been an attractive acoustic modeling approach because it provides a computational model for coarticulation and the dynamics of human speech. However, the lack of a direct decoding algorithm has been a barrier to research progress on HDM. We have developed a new HDM-based acoustic model, the Hidden-Trajectory HMM (HTHMM), which combines the state/mixture topology of a traditional monophone HMM with a target-directed hidden-trajectory model (a special form of HDM) for coarticulation modeling. Because the classical Viterbi algorithm is not admissible, we have developed a novel MAP decoding algorithm for HTHMM that correctly takes the hidden continuous trajectory into account. This paper introduces our new HTHMM decoder that allows for the first time to evaluate an HDM-type model by direct decoding instead of N -best rescoring. Using direct decoding, we demonstrate that the coarticulatory mechanism of our HTHMM matches traditional context-dependent modeling (enumeration of model parameters): The context-independent HTHMM has slightly better accuracy than a crossword-triphone HMM on the Aurora2 task. The decoder also enables us to include state-boundary optimization into the HDM/HTHMM training procedure. This paper presents the detailed decoding algorithm and evaluation results, while in [1] we present the HTHMM model itself and parameter training.
منابع مشابه
Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM - model and training
We propose and evaluate a new acoustic model that combines HMM and a special type of the hidden dynamic model (HDM) – a target-directed hidden trajectory model – into a single integrated model named HTHMM. The new model provides a computational model of coarticulation by representing the internal dynamics of human speech based on the hidden trajectory of the vocal-tract resonances. This paper f...
متن کاملNovel Acoustic Modeling with Structured Hidden Dynamics for Speech Coarticulation and Reduction
We report in this paper our recent progress on the new development, implementation, and evaluation of the structured speech model with statistically characterized hidden trajectories. Unidirectionality in coarticulation modeling in such hidden trajectory models as presented in previous EARS workshops has been extended to bi-directionality (forward as well as backward in the temporal dimension),...
متن کاملA Generative Modeling Framework for Structured Hidden Speech Dynamics
We outline a structured speech model, as a special and perhaps extreme form of probabilistic generative modeling. The model is equipped with long-contextual-span capabilities that are missing in the HMM approach. Compact (and physically meaningful) parameterization of the model is made possible by the continuity constraint in the hidden vocal tract resonance (VTR) domain. The target-directed VT...
متن کاملSpeech Parameter Sequence Modeling with Latent Trajectory Hidden Markov Model
The weakness of hidden Markov models (HMMs) is that they have difficulty in modeling and capturing the local dynamics of feature sequences due to the piecewise stationarity assumption and the conditional independence assumption on feature sequences. Traditionally, in speech recognition systems, this limitation has been circumvented by appending dynamic (delta and delta-delta) components to the ...
متن کاملA Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency
The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that they are insufficient in capturing all of the coarticulation effects. A wider phonetic context seems to be more appropriate, but often suffers from...
متن کامل